162 research outputs found
Exploratory Mediation Analysis with Many Potential Mediators
Social and behavioral scientists are increasingly employing technologies such
as fMRI, smartphones, and gene sequencing, which yield 'high-dimensional'
datasets with more columns than rows. There is increasing interest, but little
substantive theory, in the role the variables in these data play in known
processes. This necessitates exploratory mediation analysis, for which
structural equation modeling is the benchmark method. However, this method
cannot perform mediation analysis with more variables than observations. One
option is to run a series of univariate mediation models, which incorrectly
assumes independence of the mediators. Another option is regularization, but
the available implementations may lead to high false positive rates. In this
paper, we develop a hybrid approach which uses components of both filter and
regularization: the 'Coordinate-wise Mediation Filter'. It performs filtering
conditional on the other selected mediators. We show through simulation that it
improves performance over existing methods. Finally, we provide an empirical
example, showing how our method may be used for epigenetic research.Comment: R code and package are available online as supplementary material at
https://github.com/vankesteren/cmfilter and
https://github.com/vankesteren/ema_simulation
The Expected Parameter Change (EPC) for local dependence assessment in binary data latent class models
Binary data latent class models crucially assume local independence,
violations of which can seriously bias the results. We present two tools for
monitoring local dependence in binary data latent class models: the "Expected
Parameter Change" (EPC) and a generalized EPC, estimating the substantive size
and direction of possible local dependencies. The asymptotic and finite sample
behavior of the measures is studied, and two applications to the U.S. Census
estimation of Hispanic ethnicity and medical experts' ratings of x-rays
demonstrate its value in arriving at a model that balances realism and
parsimony.Comment: R code implementing our proposal and including both example datasets
is available online as supplementary materia
Evaluating the Construct Validity of Text Embeddings with Application to Survey Questions
Text embedding models from Natural Language Processing can map text data
(e.g. words, sentences, documents) to supposedly meaningful numerical
representations (a.k.a. text embeddings). While such models are increasingly
applied in social science research, one important issue is often not addressed:
the extent to which these embeddings are valid representations of constructs
relevant for social science research. We therefore propose the use of the
classic construct validity framework to evaluate the validity of text
embeddings. We show how this framework can be adapted to the opaque and
high-dimensional nature of text embeddings, with application to survey
questions. We include several popular text embedding methods (e.g. fastText,
GloVe, BERT, Sentence-BERT, Universal Sentence Encoder) in our construct
validity analyses. We find evidence of convergent and discriminant validity in
some cases. We also show that embeddings can be used to predict respondent's
answers to completely new survey questions. Furthermore, BERT-based embedding
techniques and the Universal Sentence Encoder provide more valid
representations of survey questions than do others. Our results thus highlight
the necessity to examine the construct validity of text embeddings before
deploying them in social science research.Comment: Under revie
Differential privacy and social science: An urgent puzzle
Accessing and combining large amounts of data is important for quantitative social scientists, but increasing amounts of data also increase privacy risks. To mitigate these risks, important players in official statistics, academia, and business see a solution in the concept of differential privacy. In this opinion piece, we ask how differential privacy can benefit from social-scientific insights, and, conversely, how differential privacy is likely to transform social science. First, we put differential privacy in the larger context of social science. We argue that the discussion on implementing differential privacy has been clouded by incompatible subjective beliefs about risk, each perspective having merit for different data types. Moreover, we point out existing social-scientific insights that suggest limitations to the premises of differential privacy as a data protection approach. Second, we examine the likely consequences for social science if differential privacy is widely implemented. Clearly, workflows must change, and common social science data collection will become more costly. However, in addition to data protection, differential privacy may bring other positive side effects. These could solve some issues social scientists currently struggle with, such as p-hacking, data peeking, or overfitting; after all, differential privacy is basically a robust method to analyze data. We conclude that, in the discussion around privacy risks and data protection, a large number of disciplines must band together to solve this urgent puzzle of our time, including social science, computer science, ethics, law, and statistics, as well as public and private policy
- …